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Thermodynamics of Aggregation of Two Proteins 

Kazuki Nakanishi^-^ and Macoto Kikuchi^ * 

^ Department of Physics, Osaka University, Toyonaka 560-0043 
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We investigate aggregation mechanism of two proteins in a thermodynamically unambiguous 
manner by considering the finite size effect of free energy landscape of HP lattice protein 
model. Multi-Self-Overlap-Ensemble Monte Carlo method is used for numerical calculations. 
We find that a dimer can be formed spontaneously as a thermodynamically stable state when 
the system is small enough. It implies the possibility that the aggregation of proteins in a cell is 
triggered when they are confined in a small region by, for example, being surrounded by other 
macromolecules. We also find that the dimer exhibits a transition between unstable state and 
metastable state in the infinite system. 
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1. Introduction 

Proteins are polymers consisting of amino acids. Un- 
der the physiological condition, each protein folds into 
a particular conformation called the native structure, 
which determines its function. The native structure is 
composed of characteristic local structures called a-helix 
and (3-sheet, both of which are stabilized by hydrogen 
bonds. It has been proven experimentally by Anfinsen 
that the native structure is a thermodynamically stable 
state (Anfinsen's dogma). "'^ The protein folding has be- 
come a subject of statistical physics from then on. The 
energy landscape theory developed in the last decade has 
opened a new direction of the protein folding study. Ac- 
cording to this theory, the landscape of the free-energy 
of a protein is optimized for fast folding,^ and the folding 
is just a relaxation process in the free-energy landscape. 
The theory has successfully described the folding of small 
globular proteins. The free-energy landscape is now con- 
sidered as a key for understanding the folding process. 

Some proteins work by forming clusters in vivo, rather 
than as a protein monomer. For example, hemoglobin 
of red blood cells forms a tetramer to transport oxygen. 
On the other hand, a proteins sometimes loses its func- 
tion by forming aggregates. Moreover, fibers of aggre- 
gates called amyloid fibrils cause some diseases. A well- 
known example is the amyloid fibril of the prion proteins, 
which is considered to cause brain amyloidosis such as 
the Creutzfeldt- Jacob disease and the Bovine Spongiform 
Encephalopathy. According to the widely accepted sce- 
nario, the prion protein has two different types of folded 
structures, it normally takes the monomeric native struc- 
ture, which is harmless. But the very same protein hap- 
pens to take an abnormally folded structure, which tends 
to aggregate. Amyloid fibrils have been studied exten- 
sively by experiments and the following properties were 
found:^'^ they (1) form when the protein density is high, 
(2) form when the pH is shifted from the physiological 
condition, and the native structure is made unstable as 
a result, (3) contain the f3 -sheets. This third point im- 
plies that a change from the monomeric native state to 



the abnormal fold is accompanied by a conversion from 
a-helix to fi-sheet. 

Density and temperature dependences of the stability 
of aggregates have been investigated theoretically by us- 
ing lattice protein models both on 2D square lattice,^ 
and 3D simple-cubic lattice,^ and also by Protein Inter- 
mediate Resolution Model.* The following two results 
were obtained in common: (1) the native structures of 
monomers are stable when both the density of the pro- 
tein p and temperature T are low. (2) the aggregates 
form when p is high and T is low. Harrison et al.^ found 
further that a protein has stronger tendency of forming 
a homodimer when its lowest energy state is less stable. 

Harrison et al}*^ studied the free energy landscape of 
a system containing two proteins by Monte Carlo simu- 
lations of 2D lattice model to discuss the stability of the 
aggregates. In that study, however, two proteins were 
forced to contact and thus samples taken through the 
simulation are considered to be biased. We should re- 
member that a system consisting only of two proteins 
is thermodynamically abnormal, since it does not have 
a well-defined thermodynamic limit. In fact, because of 
the translational entropy, a dimer cannot exist as a stable 
state at a finite temperature in an infinite size system. As 
long as the system size is kept finite, on the other hand, 
thermodynamical states can be defined unambiguously. 
Thus, it is essential to deal with a finite system for in- 
vestigating the thermodynamics of a two protein system. 
This point of view, however, has not been emphasized in 
the previous works, as far as we could be aware of. 

In this paper, we study aggregation mechanism of 
two proteins in a thermodynamically unambiguous man- 
ner by explicitly analyzing the system size dependence 
of the free-energy landscape of the HP lattice protein 
model. For efficient sampling of the configurations, we 
use Multi-Self-Overlap-Ensemble Monte Carlo (MSOE) 
method proposed by Chikenji et al.;^^'^^ in fact, this is 
the first application of MSOE to a multi-protein system. 
The paper is organized as follows: The model and the 
method will be described in the next section. Three types 
of the free-energy landscapes are introduced there. Re- 
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suits of calculations will be presented in §3. In the final 
section, we will discuss the thermodynamical mechanism 
of aggregation based on the results. 

2. Model and Method 

We use the square lattice HP model defined as fol- 
lows: (1) Each amino acid residue is represented as a 
bead which occupies one site of the square lattice. Two 
amino acids connected by a peptide bond are seated at 
adjacent sites; namely, the length of the peptide bond 
is taken as a lattice constant. Then a conformation of 
a protein consisting of A^-amino acids is represented as 
a self-avoiding walk of length N. (2) Each amino acid 
is either of two types, H (Hydrophobic) or P(Polar). (3) 
Only two non-bonded H beads that are nearest neighbors 
have energy equal to -1. Such a pair of H beads is said to 
make an HH- contact. Other contacts, HP and PP, do not 
contribute to energy. Then the contact energy is given as 
follows: 



E = 



E 

t<j+i 



u{Si,Sj)A{ri,rj), 



(1) 



where represents the position of the i-th residue. 
A{ri,rj) is equal to 1 if and rj are the nearest neigh- 
bors. Otherwise, A{ri,rj) is equal to 0. Si stands for H 
or P. u(Si, Sj) represents the interaction energy between 
the i-th amino acid and the j-th one: 



u(H,H) 



-1,m(H,P) = m(P,H) = w(P,P) = 0. (2) 



This definition of the energy takes only the effect of hy- 
drophobicity into account. It should be noted that the 
interaction with the solvent is included in the contact 
energy effectively, and thus the contact energy actually 
is the excess free energy of forming an HH-contact. 

Two HP sequences will be studied in the follow- 
ing sections. The sequence 1 is H^PHPHPHP^HP^H 
(15 residues), which was used by Harrison et al..^ The 
monomer of this sequence has the nondegenerate lowest 
energy structure {E = —6) shown in Fig. 1. The same in- 
teraction energy, eq.(2), is applied also to the inter-chain 
contacts when two chains are considered. Then the lowest 
energy of two chains is that of a dimer {E = —14), which 
is three-fold degenerate as shown in Fig. 2. The sequence 
2 is PHp2Hp2H2p2H2p2Hp2HP (20 residues) used by 
Gupta et al..^ Figure 3 shows the lowest energy struc- 
ture {E = —8) of the monomer of this sequence, which 
also is nondegenerate. But the lowest energy of a dimer is 
equal to that of two monomers, if eq.(2) is applied to the 
inter-chain contacts. In order to make the lowest energy 
structures of a dimer more stable than two independent 
monomers, we change m(H,H) to —1.5 only for the inter- 
chain HH-contacts. As a result of this modification, the 
lowest energy becomes E = —18. There are many con- 
formations of the lowest energy, some of which are shown 
in Fig. 4. The lowest energy dimer states are not simply 
composed of two native states of monomers for neither 
of two sequences. In other words, the monomers are re- 
quired to unfold partially before forming a dimer. 

We introduce the following three types of free energy, 
all of which characterize the aggregation process: 




Fig. 1. The lowest energy structure of the monomer of the se- 
quence 1 (E = —6). Black and white beads represent H and P 
residues, respectively. 




Fig. 2. All the lowest energy structures of the dimers of the se- 
quence I (E = —14). Black and white beads represent H and P 
residues, respectively. 




Fig. 3. The lowest energy structure of the monomer of the se- 
quence 2 {E = —8). Black and white beads represent H and P 
residues, respectively. 




Fig. 4. Examples of the lowest energy structures of the dimer of 
the sequence 2 {E = —18). Black and white beads represent H 
and P residues, respectively. 



(1) The free energy as a function of the distance R be- 
tween the centers of mass of two chains: 

F{R) = -T log I e-TdE ,(3) 

J R-Ar<r<R+Ar 

This function is expected to behave asymptotically 
for large i? as F(_R) ~ — Tlogi?, because it is com- 
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posed by the translational entropy and the free en- 
ergy of monomers when two chains are far apart and 
cannot interact. 

(2) The free energy as a function of R and the number 

of intcr-chain HH-contacts Ninter- Since N inter 7^ 
indicates that a dimer is formed, we call this free 
energy Fdi, where the suffix stands for dimer: 



1.5 



o 



0.5 



Fdi{Ninter,R) = "T log / dE 



The bin size Ar 
work. 



-h.r<r<R+Ar,Ni, 

(4) 

0.5 is used throughout the present 
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(3) The free energy as a function of R and the number 
of intra-chain HH-contacts Nintra calculated only for 
configurations of Ninter = 0, namely two indepen- 
dent monomers. Then we call this free energy Fmono- 

Fmono{Nintra,R) = -T log / e'T'dE 

J R-Ar<r<R+Ar, 

(5) 

In the actual calculations, summation over Monte Carlo 
samples are taken instead of the integration. 

It is difficult to obtain the lowest energy conformations 
by Monte Carlo simulations because of the topological 
barrier due to the excluded volume effect. In order to 
overcome this difficulty, we employ MSOE, in which the 
excluded volume condition is systematically weakened. 
For that purpose, we define an effective energy associated 
with overlap as follows: V = J^ii^i ~ Mi > 1 and 

V = Q{oT Mi = 0, where Mj is the number of amino acids 
on the i-th lattice site. The procedure of MSOE for two 
chains is the following: 

{!) Mi is counted without distinction between two 
chains. 

(2) Bivariatc multicanonical Monte Carlo method is ap- 
plied to obtain a flat distribution for both V and E. 

(3) By taking only non-overlapping conformations, that 
is, samples with ^ = 0, the standard multicanonical 
ensemble^^ is obtained. 

(4) Free energy and other thermodynamic quantities 
are calculated through the histogram reweighting 
method. 

We treat systems of different sizes. For the sequence 
1, the edge lengths L of the systems are 60, 80, and 100 
(the unit of the length is the lattice constant). For the 
sequence 2, L = 60, 80, 100, and 120. Periodic boundary 
conditions arc imposed. 

3. Result 

3.1 Sequence 1 

Figure 5 shows the specific heat per residue CiT) of 
two chains for L = 60, 80, and 100, and that of the 
monomer of the sequence 1. C(T) of the monomer ex- 
hibits a single peak. The monomer folds into the native 
state for lower temperature than the peak. Thus we call 
this temperature of the peak the folding temperature T/. 
On the other hand, two peaks are seen in C(T) of the 
two chains; one is at Tf and additional one is at a lower 



Fig. 5. Specific heat per residue of two chains for L = 60, 80, and 

100, and that of monomer of the sequence 1. Tf is the folding 
temperature of the monomer, T^{L) is the size-dependent dimer- 
ization temperature. The unstable-metastable transition temper- 
ature, described in §4, is indicated by T^. 
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Fig. 6. F{R) of the sequence 1 at T = 1.0 for L = 60, 80, and 
100. —TlogR is also shown. 



temperature. As we will see later from Fdi, two chains 
change into a dimer at the additional peak. This peak 
shifts to lower temperature as L increases. We call the 
temperature of this peak the dimerization temperature 
Td{L) , below which the dimer state is thermodynamically 
stable. 

Figures 6 and 7 show F{R) at high temperature (T = 
1.0 > Tj) and at rd(60), respectively. Since F(R) is 
calculated up to an arbitrary additive constant, we ad- 
justed it so that F{R) coincides with — Tlogi? for large 
R. F{R) at high temperature shows the following fea- 
tures: (1) F{R) Tlogi? for larger R than about 10. 

It is rather a trivial consequence of the translational en- 
tropy and indicates that Monte Carlo samples are taken 
properly. (2) it turns to increasing for R> L/2, because 
of the reduction of the translational entropy due to the 
finite size effect. As a result, F{R) has a single minimum 
at i? = L/2. F{R) at Td(60), on the other hand, ex- 
hibits an additional minimum which corresponds to the 
dimer state. In other words, the independent monomer 
state and the dimer state are separated by a free-energy 
barrier. F{R) near the dimer state exhibits no evident 
system size dependence. The dimer state thus can either 
be stable or metastable, depending on the system size. 

Dimerization process can be deduced from F^, and 
Fmono shown in Figs. 8 and 9, respectively, for L = 60 at 



4 J. Phys. Soc. Jpn. 



Full Paper 



Author Name 




10 20 30 40 
R 

Fig. 7. F{R) of the sequence 1 at Td(60) = 0.21 for L = 60, 80, 
and 100. — TlogiJ is also shown. 
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Fig. 9. F^onciNintra, R) of the Sequence 1 (T = Td(60), L = 60). 
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Fig. 8. Fai{Ni„ter,R) of the sequence 1 (T = Td(60),L = 60). 
Arrows indicate the locations of three lowest energy dimer states. 



T — Trf(60). Fdi shows that the behavior of two chains 
changes at i? = i?c ~ 5. The inter-chain HH-contact is 
hardly formed for R > Rc, while a dimer {Ninter 7^ 0) is 
more stable than two monomers {Ninter = 0) for R < Rc- 
Three lowest energy dimer configurations are in the min- 
ima of the free energy indicated by the arrows. The in- 
dependent monomer state and the dimer state are sep- 
arated by a free-energy barrier at {Ninter, R) — (l,^c)- 
Thus one (or at most two) inter-chain contact is formed 
at the transition state. Fmono suggests that two chains 
are mostly in the native state individually for R > Rc, 
while the native state becomes unstable rapidly as R 
decreases for R < Rc- Considering together these two 
figures and the fact that the lowest energy state of 
the monomers should be unfolded partially to become 
a dimer, we can infer that the dimerizaion takes place 
through the following processes: (1) Two individually 
folded chains can approach each other as close as R ^ Rc- 
(2) Once one inter-chain contact or two is formed through 
partial unfolding by thermal fiuctuation, the chains fold 
into a dimer rapidly. 

3.2 Sequence 2 

Figure 10 shows C{T) of two chains for L = 60, 80, 100, 
and 120, and that of monomer of the sequence 2. In con- 
trast to the sequence 1, only one peak is seen for C(T) of 
the two chains, which is located at higher temperature 
than Tf. This peak shifts to lower temperature as L in- 
creases. As we will see later in Fdi, two chains fold into a 




Fig. 10. Specific heat per residue of two chains for L = 60, 80, 100, 
and 120, and that of monomer of the sequence 2. Tf is the folding 
temperature of the monomer, T^{L) is the size-dependent dimer- 
ization temperature. The unstable-metastable transition temper- 
ature, described in §4, is indicated by Tc- 



dimer near the peak. Thus the temperature of this peak 
corresponds to Tii{L). So, in this case the dimerization 
takes place at higher temperature than Tf as a result of 
the larger inter-chain HH-contact energy; the monomeric 
native state never realizes as a thermodynamically stable 
state. 

Figures 11 and 12 show F{R) at high temperature 
(T = 1.0 > T/) and at Td(60), respectively. Qualita- 
tive features are similar to those of sequence 1. So, a 
dimer also for this sequence is either a stable state or a 
metastable state for T < Td{L), depending on the sys- 
tem size. Fdi and Fmono for for L = 60 at T = T(j(60) 
are shown in Figs. 13 and 14, respectively. Qualitative 
features are again similar to those of sequence 1. The 
largest difference is seen in Fmono that the individual 
monomers are partially unfolded for this sequence, which 
is a natural consequence of Td(60) > T/. In contrast to 
the sequence 1, the lowest energy dimer states consist a 
broad ensemble, which distribute around the free-energy 
minimum. The transition state between the individual 
monomer state and the dimer state is again located at 
{Ninter, R) — (1:5). A possiblc sccnario of forming a 
dimer is also similar as that of sequence 1 , except that the 
individual monomers are partially unfolded even before 
making an inter-chain contact. 
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Fig. 11. F{R) of the sequence 2 at T = 1.0 for L = 60, 80, 100, 
and 120. — Tlogi? is also shown. 
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Fig. 12. F{R) of the sequence 2 at Ta{60) = 0.36 for L 
60,80, 100, and 120. -TlogR is also shown. 
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Fig. 13. Fdi(N^„ter,R) of the sequence 2 (T = Td(60), L = 60). 
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Fig. 14. Fmono {'Nintra,R) of the Sequence 2 (T = Td{&ff),L 
60). 



4. Summary and Discussion 

We have investigated the free energy landscape of 
two protein systems in a thermodynamicaUy unambigu- 
ous way by considering the finite size effect properly. 
We found that a dimer is thermodynamicaUy stable for 
T < Td{L), and Td{L) decreases as L becomes larger. 

On the basis of the present results, we can discuss 
the stability of a dimer in, for example, a confined sys- 
tem such as two proteins surrounded by other macro- 
molecules in a cell, although the periodic boundary con- 
ditions were actually used in the present work. Then the 
following scenario for the early stage of aggregation is 
suggested: if two proteins are confined in a sufficiently 
small region, a dimer is formed spontaneously because 
it is thermodynamicaUy stable rather than metastable; 
once a dimer is formed, it will survive for a while as a 
metastable state even after released from the confined 
region, because of the free energy barrier. It should be 
noted that although this scenario seems to be consistent 
with results of experiments and other theoretical stud- 
ies on aggregation, what we have discussed above is not 
just a condition for dimerization to take place such as 
high density, but a plausible mechanism of spontaneous 
dimerization based on thermodynamic stability. 

Now we readily consider the limit of L oo. As we 
have already seen, F{R) shows the asymptotic — logi? 
dependence as long as i? < L/2, while F{R) near the 
dimer state hardly depends on the system size. Thus the 
minimum free energy of the independent monomer state 
is proportional to — log L and can become infinitely low 
in this limit. The dimerization temperature Td{L), which 
is determined by the free-energy balance between these 
two states, is expected to vanish at the same time. The 
folding temperature T/ , on the other hand, does not de- 
pend on the system size and stays finite. Accordingly, 
as the system size increases, Td(L) should eventually be- 
comes lower than Tf irrespective of the sequence. As a 
result, the single peak observed in C(T) for the sequence 
2, which corresponds to the dimerization, will split into 
two as the sequence 1, if the system becomes sufficiently 
large so that the folding of monomer takes place at higher 
temperature than dimerization. Although the dimer can 
never exist as a thermodynamicaUy stable state in the 
infinite size limit, stability of the dimer actually changes 
with temperature; The dimer is unstable at high temper- 
ature, while it becomes metastable at low temperature. 
This change is not a phase transition in a strict sense, 
but we may call it unstable-metastable transition of the 
dimer. The temperature at which this transition takes 
place is indicated as Tc in Figs. 5 and 10. As seen in the 
figures, no significant effect is observed in C{T) at Tc- 
This temperature, however, still has a well-defined phys- 
ical meaning that the dimer never becomes stable above 
it even in a finite system. 

Recently, Levy et al.^^ have studied formation of a ho- 
modimer for several different proteins based on the free 
energy landscape calculation. They used an off-lattice 
model with Go-like interactions both for intra- and inter- 
chain interactions. Thus the ground state of the dimer is 
unique and the same intra-chain contacts are formed in 
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the monomelic native state and the dimer ground state. 
System size dependence was not exphcitly considered as 
the previous works described in the introduction did. In- 
stead, two chains arc made unseparable by additional 
virtual forces. Therefore, their model and the computa- 
tional framework are totally different from the present 
work. But it will still be informative to compare the re- 
sult with the present one. They found that the struc- 
ture of the free-energy landscape differ largely for differ- 
ent proteins reflecting a variety of binding mechanisms. 
Among them, the free-energy landscape of A repressor 
shares a common feature with that of sequence 1. In 
fact, according to their result, two independently folded 
A repressor change into a dimer when they come close 
enough. This protein is known to form a dimer through 
so called induced fit mechanism. So, we may say that the 
sequence 1 studied in the present work also exhibits the 
induced fit into one of the three ground states. The fold- 
ing process of sequence 2 differs from that of sequence 1 
in that the monomers are partially unfolded before form- 
ing a dimer. It seems to correspond to folding-binding re- 
action of, for example, Arc-repressor. We should stress, 
however, that the folding process is expected to change 
into the induced- fit type as sequence 1, if the system be- 
comes large enough so that the dimerization temperature 
is lower than the folding temperature. 

Resemblances and differences with the crowding ef- 
fect of protein folding is worth mentioning. It has been 
pointed out by Takagi et al}^ that free energy landscape 
of a protein is modified when it is confined in a small 
region, and a faster folding is achieved as a result. The 
present study may be regarded as its dimer counterpart. 
In contrast to that study, however, we found that the 
metastable state of a dimer can change to the stable state 
as a consequence of confinement. 

As a final remark, we stress again that the finite-size 
effect is a key for understanding the aggregation of pro- 



teins. The method proposed in this paper can be ap- 
plied straightforwardly to the aggregation of more real- 
istic protein models and that of three or more proteins. 
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